Convolutional Neural Networks

Project: Write an Algorithm for Landmark Classification


In this notebook, some template code has already been provided for you, and you will need to implement additional functionality to successfully complete this project. You will not need to modify the included code beyond what is requested. Sections that begin with '(IMPLEMENTATION)' in the header indicate that the following block of code will require additional functionality which you must provide. Instructions will be provided for each section, and the specifics of the implementation are marked in the code block with a 'TODO' statement. Please be sure to read the instructions carefully!

Note: Once you have completed all the code implementations, you need to finalize your work by exporting the Jupyter Notebook as an HTML document. Before exporting the notebook to HTML, all the code cells need to have been run so that reviewers can see the final implementation and output. You can then export the notebook by using the menu above and navigating to File -> Download as -> HTML (.html). Include the finished document along with this notebook as your submission.

In addition to implementing code, there will be questions that you must answer which relate to the project and your implementation. Each section where you will answer a question is preceded by a 'Question X' header. Carefully read each question and provide thorough answers in the following text boxes that begin with 'Answer:'. Your project submission will be evaluated based on your answers to each of the questions and the implementation you provide.

Note: Code and Markdown cells can be executed using the Shift + Enter keyboard shortcut. Markdown cells can be edited by double-clicking the cell to enter edit mode.

The rubric contains optional "Stand Out Suggestions" for enhancing the project beyond the minimum requirements. If you decide to pursue the "Stand Out Suggestions", you should include the code in this Jupyter notebook.


Why We're Here

Photo sharing and photo storage services like to have location data for each photo that is uploaded. With the location data, these services can build advanced features, such as automatic suggestion of relevant tags or automatic photo organization, which help provide a compelling user experience. Although a photo's location can often be obtained by looking at the photo's metadata, many photos uploaded to these services will not have location metadata available. This can happen when, for example, the camera capturing the picture does not have GPS or if a photo's metadata is scrubbed due to privacy concerns.

If no location metadata for an image is available, one way to infer the location is to detect and classify a discernable landmark in the image. Given the large number of landmarks across the world and the immense volume of images that are uploaded to photo sharing services, using human judgement to classify these landmarks would not be feasible.

In this notebook, you will take the first steps towards addressing this problem by building models to automatically predict the location of the image based on any landmarks depicted in the image. At the end of this project, your code will accept any user-supplied image as input and suggest the top k most relevant landmarks from 50 possible landmarks from across the world. The image below displays a potential sample output of your finished project.

Sample landmark classification output

The Road Ahead

We break the notebook into separate steps. Feel free to use the links below to navigate the notebook.

  • Step 0: Download Datasets and Install Python Modules
  • Step 1: Create a CNN to Classify Landmarks (from Scratch)
  • Step 2: Create a CNN to Classify Landmarks (using Transfer Learning)
  • Step 3: Write Your Landmark Prediction Algorithm

Step 0: Download Datasets and Install Python Modules

Note: if you are using the Udacity workspace, YOU CAN SKIP THIS STEP. The dataset can be found in the /data folder and all required Python modules have been installed in the workspace.

Download the landmark dataset. Unzip the folder and place it in this project's home directory, at the location /landmark_images.

Install the following Python modules:

  • cv2
  • matplotlib
  • numpy
  • PIL
  • torch
  • torchvision

Step 1: Create a CNN to Classify Landmarks (from Scratch)

In this step, you will create a CNN that classifies landmarks. You must create your CNN from scratch (so, you can't use transfer learning yet!), and you must attain a test accuracy of at least 20%.

Although 20% may seem low at first glance, it seems more reasonable after realizing how difficult of a problem this is. Many times, an image that is taken at a landmark captures a fairly mundane image of an animal or plant, like in the following picture.

Bird in Haleakalā National Park

Just by looking at that image alone, would you have been able to guess that it was taken at the Haleakalā National Park in Hawaii?

An accuracy of 20% is significantly better than random guessing, which would provide an accuracy of just 2%. In Step 2 of this notebook, you will have the opportunity to greatly improve accuracy by using transfer learning to create a CNN.

Remember that practice is far ahead of theory in deep learning. Experiment with many different architectures, and trust your intuition. And, of course, have fun!

(IMPLEMENTATION) Specify Data Loaders for the Landmark Dataset

Use the code cell below to create three separate data loaders: one for training data, one for validation data, and one for test data. Randomly split the images located at landmark_images/train to create the train and validation data loaders, and use the images located at landmark_images/test to create the test data loader.

Note: Remember that the dataset can be found at /data/landmark_images/ in the workspace.

All three of your data loaders should be accessible via a dictionary named loaders_scratch. Your train data loader should be at loaders_scratch['train'], your validation data loader should be at loaders_scratch['valid'], and your test data loader should be at loaders_scratch['test'].

You may find this documentation on custom datasets to be a useful resource. If you are interested in augmenting your training and/or validation data, check out the wide variety of transforms!

In [1]:
!find /data/landmark_images -maxdepth 4 -type d
/data/landmark_images
/data/landmark_images/train
/data/landmark_images/train/49.Temple_of_Olympian_Zeus
/data/landmark_images/train/01.Mount_Rainier_National_Park
/data/landmark_images/train/22.Moscow_Raceway
/data/landmark_images/train/43.Gullfoss_Falls
/data/landmark_images/train/16.Eiffel_Tower
/data/landmark_images/train/06.Niagara_Falls
/data/landmark_images/train/12.Kantanagar_Temple
/data/landmark_images/train/18.Delicate_Arch
/data/landmark_images/train/02.Ljubljana_Castle
/data/landmark_images/train/26.Pont_du_Gard
/data/landmark_images/train/45.Temple_of_Heaven
/data/landmark_images/train/07.Stonehenge
/data/landmark_images/train/15.Central_Park
/data/landmark_images/train/23.Externsteine
/data/landmark_images/train/33.Sydney_Opera_House
/data/landmark_images/train/24.Soreq_Cave
/data/landmark_images/train/38.Forth_Bridge
/data/landmark_images/train/46.Great_Wall_of_China
/data/landmark_images/train/20.Matterhorn
/data/landmark_images/train/08.Grand_Canyon
/data/landmark_images/train/35.Monumento_a_la_Revolucion
/data/landmark_images/train/10.Edinburgh_Castle
/data/landmark_images/train/05.London_Olympic_Stadium
/data/landmark_images/train/11.Mount_Rushmore_National_Memorial
/data/landmark_images/train/41.Machu_Picchu
/data/landmark_images/train/00.Haleakala_National_Park
/data/landmark_images/train/48.Whitby_Abbey
/data/landmark_images/train/39.Gateway_of_India
/data/landmark_images/train/40.Stockholm_City_Hall
/data/landmark_images/train/19.Vienna_City_Hall
/data/landmark_images/train/17.Changdeokgung
/data/landmark_images/train/30.Brooklyn_Bridge
/data/landmark_images/train/47.Prague_Astronomical_Clock
/data/landmark_images/train/37.Atomium
/data/landmark_images/train/09.Golden_Gate_Bridge
/data/landmark_images/train/34.Great_Barrier_Reef
/data/landmark_images/train/28.Sydney_Harbour_Bridge
/data/landmark_images/train/31.Washington_Monument
/data/landmark_images/train/03.Dead_Sea
/data/landmark_images/train/04.Wroclaws_Dwarves
/data/landmark_images/train/44.Trevi_Fountain
/data/landmark_images/train/29.Petronas_Towers
/data/landmark_images/train/32.Hanging_Temple
/data/landmark_images/train/42.Death_Valley_National_Park
/data/landmark_images/train/27.Seattle_Japanese_Garden
/data/landmark_images/train/13.Yellowstone_National_Park
/data/landmark_images/train/25.Banff_National_Park
/data/landmark_images/train/21.Taj_Mahal
/data/landmark_images/train/36.Badlands_National_Park
/data/landmark_images/train/14.Terminal_Tower
/data/landmark_images/test
/data/landmark_images/test/49.Temple_of_Olympian_Zeus
/data/landmark_images/test/01.Mount_Rainier_National_Park
/data/landmark_images/test/22.Moscow_Raceway
/data/landmark_images/test/43.Gullfoss_Falls
/data/landmark_images/test/16.Eiffel_Tower
/data/landmark_images/test/06.Niagara_Falls
/data/landmark_images/test/12.Kantanagar_Temple
/data/landmark_images/test/18.Delicate_Arch
/data/landmark_images/test/02.Ljubljana_Castle
/data/landmark_images/test/26.Pont_du_Gard
/data/landmark_images/test/45.Temple_of_Heaven
/data/landmark_images/test/07.Stonehenge
/data/landmark_images/test/15.Central_Park
/data/landmark_images/test/23.Externsteine
/data/landmark_images/test/33.Sydney_Opera_House
/data/landmark_images/test/24.Soreq_Cave
/data/landmark_images/test/38.Forth_Bridge
/data/landmark_images/test/46.Great_Wall_of_China
/data/landmark_images/test/20.Matterhorn
/data/landmark_images/test/08.Grand_Canyon
/data/landmark_images/test/35.Monumento_a_la_Revolucion
/data/landmark_images/test/10.Edinburgh_Castle
/data/landmark_images/test/05.London_Olympic_Stadium
/data/landmark_images/test/11.Mount_Rushmore_National_Memorial
/data/landmark_images/test/41.Machu_Picchu
/data/landmark_images/test/00.Haleakala_National_Park
/data/landmark_images/test/48.Whitby_Abbey
/data/landmark_images/test/39.Gateway_of_India
/data/landmark_images/test/40.Stockholm_City_Hall
/data/landmark_images/test/19.Vienna_City_Hall
/data/landmark_images/test/17.Changdeokgung
/data/landmark_images/test/30.Brooklyn_Bridge
/data/landmark_images/test/47.Prague_Astronomical_Clock
/data/landmark_images/test/37.Atomium
/data/landmark_images/test/09.Golden_Gate_Bridge
/data/landmark_images/test/34.Great_Barrier_Reef
/data/landmark_images/test/28.Sydney_Harbour_Bridge
/data/landmark_images/test/31.Washington_Monument
/data/landmark_images/test/03.Dead_Sea
/data/landmark_images/test/04.Wroclaws_Dwarves
/data/landmark_images/test/44.Trevi_Fountain
/data/landmark_images/test/29.Petronas_Towers
/data/landmark_images/test/32.Hanging_Temple
/data/landmark_images/test/42.Death_Valley_National_Park
/data/landmark_images/test/27.Seattle_Japanese_Garden
/data/landmark_images/test/13.Yellowstone_National_Park
/data/landmark_images/test/25.Banff_National_Park
/data/landmark_images/test/21.Taj_Mahal
/data/landmark_images/test/36.Badlands_National_Park
/data/landmark_images/test/14.Terminal_Tower
In [2]:
import numpy as np
import torch
from torch.utils import data
from torchvision import transforms
from torchvision.datasets import ImageFolder
import torch.nn as nn
import os
In [3]:
!pip install split-folders
Collecting split-folders
  Downloading https://files.pythonhosted.org/packages/b8/5f/3c2b2f7ea5e047c8cdc3bb00ae582c5438fcdbbedcc23b3cc1c2c7aae642/split_folders-0.4.3-py3-none-any.whl
Installing collected packages: split-folders
Successfully installed split-folders-0.4.3
In [4]:
import splitfolders
In [13]:
# splitfolders.ratio("/data/landmark_images/train", output="./output", seed=53, ratio=(.8, .2), group_prefix=None) # default values
Copying files: 4996 files [00:14, 356.79 files/s]
In [3]:
!ls output
train  val
In [3]:
### TODO: Write data loaders for training, validation, and test sets
## Specify appropriate transforms, and batch_sizes


DS_DIR = "/data/landmark_images"
BATCH_SIZE = 2**6

IMG_MEAN = torch.tensor( [0.485, 0.456, 0.406])
IMG_STD = torch.tensor([0.229, 0.224, 0.225])

normalize = transforms.Normalize(mean=IMG_MEAN,
                             std=IMG_STD)

train_transforms = transforms.Compose([
        transforms.Resize(226),
        transforms.RandomCrop(224),
        transforms.RandomHorizontalFlip(0.25), 
        transforms.RandomVerticalFlip(0.25),
        transforms.ToTensor(),
       normalize
])

test_val_transforms = transforms.Compose([
        transforms.Resize(226),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        normalize
])



train_ds = ImageFolder("./output/train", transform=train_transforms)
val_ds = ImageFolder("./output/val", transform=test_val_transforms)

test_ds = ImageFolder(os.path.join(DS_DIR, 'test'), transform=test_val_transforms)


# # Train-Val split
# train_ratio = 0.8
# train_split_idx = int(train_ratio * len(full_ds))
# idxs = np.arange(len(full_ds))
# np.random.shuffle(idxs)
# train_idxs = idxs[:train_split_idx]
# val_idxs = train_idxs = idxs[train_split_idx:]
# train_sampler = data.sampler.SubsetRandomSampler(train_idxs)
# val_sampler = data.sampler.SubsetRandomSampler(val_idxs)

#train_ds, val_ds = data.random_split(full_ds, [train_size, val_size]), not supported in v. 0.4.0 :( 



train_loader = data.DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True)
val_loader = data.DataLoader(val_ds, batch_size=BATCH_SIZE, shuffle=False)
test_loader = data.DataLoader(test_ds, batch_size=1, shuffle=False)


loaders_scratch = {'train': train_loader, 'valid': val_loader, 'test': test_loader}
In [4]:
len(train_ds.classes)
Out[4]:
52
In [5]:
len(val_ds.classes)
Out[5]:
52
In [6]:
len(test_ds.classes)
Out[6]:
50
In [23]:
# calculates the Mean and STD of the whole trainning DS

# ds_len = len(train_ds)
# mean, std = 0,0
# for i,y in train_ds:
#     mean += i.mean(1).mean(1)
#     std += i.std(1).std(1)
# mean /= ds_len
# std /= ds_len

# print([round(n, 5) for n in mean.tolist()])
# print([round(n, 5) for n in std.tolist()])
[-0.069, 0.09211, 0.2922]
[0.16686, 0.16738, 0.17752]

Question 1: Describe your chosen procedure for preprocessing the data.

  • How does your code resize the images (by cropping, stretching, etc)? What size did you pick for the input tensor, and why?
  • Did you decide to augment the dataset? If so, how (through translations, flips, rotations, etc)? If not, why not?

Answer:

  1. First the image is resized to 257x257px, then cropped randomly to 256x256px. Hence size of the input tensor will be 256x256px. This size is used to balance between performance and information retention.
  2. Yes, using random horizantal and vertical flips, then normalized the image using the mean and standard deviation of ImageNet dataset.

(IMPLEMENTATION) Visualize a Batch of Training Data

Use the code cell below to retrieve a batch of images from your train data loader, display at least 5 images simultaneously, and label each displayed image with its class name (e.g., "Golden Gate Bridge").

Visualizing the output of your data loader is a great way to ensure that your data loading and preprocessing are working as expected.

In [7]:
def denormalize_img(x):
    x[0, :, :] = x[ 0, :, :] * IMG_STD[0] + IMG_MEAN[0]
    x[1, :, :] = x[ 1, :, :] * IMG_STD[1] + IMG_MEAN[1]
    x[2, :, :] = x[ 2, :, :] * IMG_STD[2] + IMG_MEAN[2]
    return x
In [9]:
import matplotlib.pyplot as plt
%matplotlib inline

idx_to_class = dict(zip(train_ds.class_to_idx.values(), train_ds.class_to_idx.keys()))
## TODO: visualize a batch of the train data loader
fig, axs = plt.subplots(5, 5, figsize=(30, 25))

## the class names can be accessed at the `classes` attribute
## of your dataset object (e.g., `train_dataset.classes`)

temp_ds,y = next(iter(train_loader))
for i, a in enumerate(axs.flatten()):
    norm_img = temp_ds[i]
    img = denormalize_img(norm_img)
    img = img.permute(1, 2, 0) 
    a.imshow(img.numpy())
    a.set_title(f'{idx_to_class[y[i].item()][3: ].replace("_", " ") }')
    a.axis('off')
    
    
fig.tight_layout()

Initialize use_cuda variable

In [11]:
# useful variable that tells us whether we should use the GPU
use_cuda = torch.cuda.is_available()

(IMPLEMENTATION) Specify Loss Function and Optimizer

Use the next code cell to specify a loss function and optimizer. Save the chosen loss function as criterion_scratch, and fill in the function get_optimizer_scratch below.

In [12]:
## TODO: select loss functiondq
criterion_scratch = nn.CrossEntropyLoss()

def get_optimizer_scratch(model):
    return torch.optim.Adam(model.parameters(), lr=1e-3)
    
    

(IMPLEMENTATION) Model Architecture

Create a CNN to classify images of landmarks. Use the template in the code cell below.

  1. Conv; $[3x224x224]$ --> $[9x224x224]$ (3x3)
  2. MaxPool; $[9x224x224]$ --> $[9x112x112]$ (2x2)
  3. Conv; $[9x112x112]$ --> $[18x112x112]$ (3x3)
  4. MaxPool; $[18x112x112]$ --> $[18x56x56]$ (2x2)
  5. Conv; $[18x56x56]$ --> $[24x56x56]$ (3x3)
  6. MaxPool; $[24x56x56]$ --> $[24x28x28]$ (2x2)
  7. Conv; $[24x28x28]$ --> $[30x28x28]$ (3x3)
  8. MaxPool; $[30x28x28]$ --> $[30x14x14]$ (3x3)
In [38]:
import torch.nn as nn

# define the CNN architecture
class Net(nn.Module):
    ## TODO: choose an architecture, and complete the class
    def __init__(self):
        super(Net, self).__init__()
        
        ## Define layers of a CNN
        self.stack =  nn.Sequential(
            nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2,2),
            
            nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2,2),
            
            nn.Conv2d(32, 40, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2,2),
            
            nn.Conv2d(40, 64, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2,2),
            
            nn.Conv2d(64, 70, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2,2),
        )
        
        self.fc = nn.Sequential(
            nn.Linear(7**2 * 70, 4096),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(4096, 50),
    
        )

        
        
    
    def forward(self, x):
        ## Define forward behavior
        x = self.stack(x)
        x = x.view(-1, 7**2 * 70)
        
        x = self.fc(x)
        return x

#-#-# Do NOT modify the code below this line. #-#-#

# instantiate the CNN
model_scratch = Net()

# move tensors to GPU if CUDA is available
if use_cuda:
    model_scratch.cuda()

Question 2: Outline the steps you took to get to your final CNN architecture and your reasoning at each step.

Answer: The network is designed to increase the depth while decreasing the feature size to extract patterns at each convolution the depth is increase then the resulting feature map is downsampled by half at each step by a max pool layer.

(IMPLEMENTATION) Implement the Training Algorithm

Implement your training algorithm in the code cell below. Save the final model parameters at the filepath stored in the variable save_path.

In [39]:
def train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path):
    """returns trained model"""
    # initialize tracker for minimum validation loss
    valid_loss_min = np.Inf 
    
    for epoch in range(1, n_epochs+1):
        # initialize variables to monitor training and validation loss
        train_loss = 0.0
        valid_loss = 0.0
        
        ###################
        # train the model #
        ###################
        # set the module to training mode
        model.train()
        for batch_idx, (data, target) in enumerate(loaders['train']):
            # move to GPU
            if use_cuda:
                data, target = data.cuda(), target.cuda()
            

            ## TODO: find the loss and update the model parameters accordingly
            ## record the average training loss, using something like
            ## train_loss = train_loss + ((1 / (batch_idx + 1)) * (loss.data.item() - train_loss))
            out = model(data)
            
         
            
            optimizer.zero_grad()
            loss = criterion(out, target)
            loss.backward()
            optimizer.step()
            train_loss += loss.item() / len(target)
            
            del data
            del target
            torch.cuda.empty_cache()
            

            
            

        ######################    
        # validate the model #
        ######################
        # set the model to evaluation mode
        model.eval()
        for batch_idx, (data, target) in enumerate(loaders['valid']):
            # move to GPU
            if use_cuda:
                data, target = data.cuda(), target.cuda()

            ## TODO: update average validation loss 

            out = model(data)
            loss = criterion(out, target)
            valid_loss += loss.item() / len(target)
            del data
            del target
            torch.cuda.empty_cache()
            

        # print training/validation statistics 
        print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
            epoch, 
            train_loss,
            valid_loss
            ))

        ## TODO: if the validation loss has decreased, save the model at the filepath stored in save_path
        if (valid_loss < valid_loss_min):
            torch.save(model.state_dict(), save_path)
            valid_loss_min = valid_loss
            print('Model saved!')

        
        
    return model

(IMPLEMENTATION) Experiment with the Weight Initialization

Use the code cell below to define a custom weight initialization, and then train with your weight initialization for a few epochs. Make sure that neither the training loss nor validation loss is nan.

Later on, you will be able to see how this compares to training with PyTorch's default weight initialization.

In [40]:
def custom_weight_init(m):
    ## TODO: implement a weight initialization strategy
    if type(m) == nn.Linear:
        torch.nn.init.xavier_uniform_(m.weight)
        m.bias.data.fill_(1e-3)
    
    

#-#-# Do NOT modify the code below this line. #-#-#
    
model_scratch.apply(custom_weight_init)
model_scratch = train(20, loaders_scratch, model_scratch, get_optimizer_scratch(model_scratch),
                      criterion_scratch, use_cuda, 'ignore.pt')
Epoch: 1 	Training Loss: 3.886930 	Validation Loss: 0.988128
Model saved!
Epoch: 2 	Training Loss: 3.722304 	Validation Loss: 0.950881
Model saved!
Epoch: 3 	Training Loss: 3.583906 	Validation Loss: 0.899724
Model saved!
Epoch: 4 	Training Loss: 3.395407 	Validation Loss: 0.858400
Model saved!
Epoch: 5 	Training Loss: 3.233294 	Validation Loss: 0.831892
Model saved!
Epoch: 6 	Training Loss: 3.116887 	Validation Loss: 0.807176
Model saved!
Epoch: 7 	Training Loss: 2.952111 	Validation Loss: 0.779976
Model saved!
Epoch: 8 	Training Loss: 2.800899 	Validation Loss: 0.765407
Model saved!
Epoch: 9 	Training Loss: 2.649984 	Validation Loss: 0.755081
Model saved!
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-40-32d05be2173a> in <module>()
     11 model_scratch.apply(custom_weight_init)
     12 model_scratch = train(20, loaders_scratch, model_scratch, get_optimizer_scratch(model_scratch),
---> 13                       criterion_scratch, use_cuda, 'ignore.pt')

<ipython-input-39-0f73181e328a> in train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path)
     14         # set the module to training mode
     15         model.train()
---> 16         for batch_idx, (data, target) in enumerate(loaders['train']):
     17             # move to GPU
     18             if use_cuda:

/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py in __next__(self)
    262         if self.num_workers == 0:  # same-process loading
    263             indices = next(self.sample_iter)  # may raise StopIteration
--> 264             batch = self.collate_fn([self.dataset[i] for i in indices])
    265             if self.pin_memory:
    266                 batch = pin_memory_batch(batch)

/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py in <listcomp>(.0)
    262         if self.num_workers == 0:  # same-process loading
    263             indices = next(self.sample_iter)  # may raise StopIteration
--> 264             batch = self.collate_fn([self.dataset[i] for i in indices])
    265             if self.pin_memory:
    266                 batch = pin_memory_batch(batch)

/opt/conda/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/datasets/folder.py in __getitem__(self, index)
    101         sample = self.loader(path)
    102         if self.transform is not None:
--> 103             sample = self.transform(sample)
    104         if self.target_transform is not None:
    105             target = self.target_transform(target)

/opt/conda/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/transforms/transforms.py in __call__(self, img)
     47     def __call__(self, img):
     48         for t in self.transforms:
---> 49             img = t(img)
     50         return img
     51 

/opt/conda/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/transforms/transforms.py in __call__(self, img)
    173             PIL Image: Rescaled image.
    174         """
--> 175         return F.resize(img, self.size, self.interpolation)
    176 
    177     def __repr__(self):

/opt/conda/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/transforms/functional.py in resize(img, size, interpolation)
    202             oh = size
    203             ow = int(size * w / h)
--> 204             return img.resize((ow, oh), interpolation)
    205     else:
    206         return img.resize(size[::-1], interpolation)

/opt/conda/lib/python3.6/site-packages/PIL/Image.py in resize(self, size, resample, box)
   1763         self.load()
   1764 
-> 1765         return self._new(self.im.resize(size, resample, box))
   1766 
   1767     def rotate(self, angle, resample=NEAREST, expand=0, center=None,

KeyboardInterrupt: 

(IMPLEMENTATION) Train and Validate the Model

Run the next code cell to train your model.

In [41]:
## TODO: you may change the number of epochs if you'd like,
## but changing it is not required
num_epochs = 100

#-#-# Do NOT modify the code below this line. #-#-#

# function to re-initialize a model with pytorch's default weight initialization
def default_weight_init(m):
    reset_parameters = getattr(m, 'reset_parameters', None)
    if callable(reset_parameters):
        m.reset_parameters()

# reset the model parameters
model_scratch.apply(default_weight_init)

# train the model
model_scratch = train(num_epochs, loaders_scratch, model_scratch, get_optimizer_scratch(model_scratch), 
                      criterion_scratch, use_cuda, 'model_scratch.pt')
Epoch: 1 	Training Loss: 3.887769 	Validation Loss: 0.987728
Model saved!
Epoch: 2 	Training Loss: 3.749686 	Validation Loss: 0.964379
Model saved!
Epoch: 3 	Training Loss: 3.688260 	Validation Loss: 0.954351
Model saved!
Epoch: 4 	Training Loss: 3.610213 	Validation Loss: 0.929111
Model saved!
Epoch: 5 	Training Loss: 3.514282 	Validation Loss: 0.887158
Model saved!
Epoch: 6 	Training Loss: 3.347048 	Validation Loss: 0.860818
Model saved!
Epoch: 7 	Training Loss: 3.163690 	Validation Loss: 0.836342
Model saved!
Epoch: 8 	Training Loss: 3.011985 	Validation Loss: 0.797206
Model saved!
Epoch: 9 	Training Loss: 2.886255 	Validation Loss: 0.768595
Model saved!
Epoch: 10 	Training Loss: 2.690659 	Validation Loss: 0.735869
Model saved!
Epoch: 11 	Training Loss: 2.557183 	Validation Loss: 0.727566
Model saved!
Epoch: 12 	Training Loss: 2.386147 	Validation Loss: 0.688280
Model saved!
Epoch: 13 	Training Loss: 2.260008 	Validation Loss: 0.687843
Model saved!
Epoch: 14 	Training Loss: 2.156144 	Validation Loss: 0.680108
Model saved!
Epoch: 15 	Training Loss: 2.025177 	Validation Loss: 0.680185
Epoch: 16 	Training Loss: 1.923316 	Validation Loss: 0.683346
Epoch: 17 	Training Loss: 1.802299 	Validation Loss: 0.688268
Epoch: 18 	Training Loss: 1.693495 	Validation Loss: 0.707195
Epoch: 19 	Training Loss: 1.606106 	Validation Loss: 0.691423
Epoch: 20 	Training Loss: 1.450898 	Validation Loss: 0.721725
Epoch: 21 	Training Loss: 1.411406 	Validation Loss: 0.691227
Epoch: 22 	Training Loss: 1.332183 	Validation Loss: 0.727931
Epoch: 23 	Training Loss: 1.235636 	Validation Loss: 0.745666
Epoch: 24 	Training Loss: 1.187213 	Validation Loss: 0.775120
Epoch: 25 	Training Loss: 1.142313 	Validation Loss: 0.757462
Epoch: 26 	Training Loss: 1.092679 	Validation Loss: 0.750470
Epoch: 27 	Training Loss: 1.011855 	Validation Loss: 0.805013
Epoch: 28 	Training Loss: 0.974265 	Validation Loss: 0.818395
Epoch: 29 	Training Loss: 0.961382 	Validation Loss: 0.837920
Epoch: 30 	Training Loss: 0.877042 	Validation Loss: 0.859307
Epoch: 31 	Training Loss: 0.850927 	Validation Loss: 0.797085
Epoch: 32 	Training Loss: 0.805454 	Validation Loss: 0.866395
Epoch: 33 	Training Loss: 0.783623 	Validation Loss: 0.861317
Epoch: 34 	Training Loss: 0.739794 	Validation Loss: 0.842438
Epoch: 35 	Training Loss: 0.734466 	Validation Loss: 0.898400
Epoch: 36 	Training Loss: 0.657999 	Validation Loss: 0.887767
Epoch: 37 	Training Loss: 0.668607 	Validation Loss: 0.933197
Epoch: 38 	Training Loss: 0.650235 	Validation Loss: 0.854772
Epoch: 39 	Training Loss: 0.579325 	Validation Loss: 0.858530
Epoch: 40 	Training Loss: 0.563114 	Validation Loss: 0.882334
Epoch: 41 	Training Loss: 0.545057 	Validation Loss: 0.938011
Epoch: 42 	Training Loss: 0.554101 	Validation Loss: 0.949364
Epoch: 43 	Training Loss: 0.510750 	Validation Loss: 0.987225
Epoch: 44 	Training Loss: 0.519451 	Validation Loss: 0.973133
Epoch: 45 	Training Loss: 0.482701 	Validation Loss: 0.944053
Epoch: 46 	Training Loss: 0.481163 	Validation Loss: 0.986054
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-41-234244f91f3a> in <module>()
     16 # train the model
     17 model_scratch = train(num_epochs, loaders_scratch, model_scratch, get_optimizer_scratch(model_scratch), 
---> 18                       criterion_scratch, use_cuda, 'model_scratch.pt')

<ipython-input-39-0f73181e328a> in train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path)
     14         # set the module to training mode
     15         model.train()
---> 16         for batch_idx, (data, target) in enumerate(loaders['train']):
     17             # move to GPU
     18             if use_cuda:

/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py in __next__(self)
    262         if self.num_workers == 0:  # same-process loading
    263             indices = next(self.sample_iter)  # may raise StopIteration
--> 264             batch = self.collate_fn([self.dataset[i] for i in indices])
    265             if self.pin_memory:
    266                 batch = pin_memory_batch(batch)

/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py in <listcomp>(.0)
    262         if self.num_workers == 0:  # same-process loading
    263             indices = next(self.sample_iter)  # may raise StopIteration
--> 264             batch = self.collate_fn([self.dataset[i] for i in indices])
    265             if self.pin_memory:
    266                 batch = pin_memory_batch(batch)

/opt/conda/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/datasets/folder.py in __getitem__(self, index)
     99         """
    100         path, target = self.samples[index]
--> 101         sample = self.loader(path)
    102         if self.transform is not None:
    103             sample = self.transform(sample)

/opt/conda/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/datasets/folder.py in default_loader(path)
    145         return accimage_loader(path)
    146     else:
--> 147         return pil_loader(path)
    148 
    149 

/opt/conda/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/datasets/folder.py in pil_loader(path)
    128     with open(path, 'rb') as f:
    129         img = Image.open(f)
--> 130         return img.convert('RGB')
    131 
    132 

/opt/conda/lib/python3.6/site-packages/PIL/Image.py in convert(self, mode, matrix, dither, palette, colors)
    890         """
    891 
--> 892         self.load()
    893 
    894         if not mode and self.mode == "P":

/opt/conda/lib/python3.6/site-packages/PIL/ImageFile.py in load(self)
    233 
    234                             b = b + s
--> 235                             n, err_code = decoder.decode(b)
    236                             if n < 0:
    237                                 break

KeyboardInterrupt: 

(IMPLEMENTATION) Test the Model

Run the code cell below to try out your model on the test dataset of landmark images. Run the code cell below to calculate and print the test loss and accuracy. Ensure that your test accuracy is greater than 20%.

In [42]:
def test(loaders, model, criterion, use_cuda):

    # monitor test loss and accuracy
    test_loss = 0.
    correct = 0.
    total = 0.

    # set the module to evaluation mode
    model.eval()

    for batch_idx, (data, target) in enumerate(loaders['test']):
        # move to GPU
        if use_cuda:
            data, target = data.cuda(), target.cuda()
        # forward pass: compute predicted outputs by passing inputs to the model
        output = model(data)
        # calculate the loss
        loss = criterion(output, target)
        # update average test loss 
        test_loss = test_loss + ((1 / (batch_idx + 1)) * (loss.data.item() - test_loss))
        # convert output probabilities to predicted class
        pred = output.data.max(1, keepdim=True)[1]
        # compare predictions to true label
        correct += np.sum(np.squeeze(pred.eq(target.data.view_as(pred))).cpu().numpy())
        total += data.size(0)
        
        del data
        del target
        torch.cuda.empty_cache()
            
    print('Test Loss: {:.6f}\n'.format(test_loss))

    print('\nTest Accuracy: %2d%% (%2d/%2d)' % (
        100. * correct / total, correct, total))

# load the model that got the best validation accuracy
model_scratch.load_state_dict(torch.load('model_scratch.pt'))
test(loaders_scratch, model_scratch, criterion_scratch, use_cuda)
Test Loss: 2.398817


Test Accuracy: 38% (486/1250)

Step 2: Create a CNN to Classify Landmarks (using Transfer Learning)

You will now use transfer learning to create a CNN that can identify landmarks from images. Your CNN must attain at least 60% accuracy on the test set.

(IMPLEMENTATION) Specify Data Loaders for the Landmark Dataset

Use the code cell below to create three separate data loaders: one for training data, one for validation data, and one for test data. Randomly split the images located at landmark_images/train to create the train and validation data loaders, and use the images located at landmark_images/test to create the test data loader.

All three of your data loaders should be accessible via a dictionary named loaders_transfer. Your train data loader should be at loaders_transfer['train'], your validation data loader should be at loaders_transfer['valid'], and your test data loader should be at loaders_transfer['test'].

If you like, you are welcome to use the same data loaders from the previous step, when you created a CNN from scratch.

In [43]:
### TODO: Write data loaders for training, validation, and test sets
## Specify appropriate transforms, and batch_sizes

loaders_transfer = loaders_scratch.copy()

(IMPLEMENTATION) Specify Loss Function and Optimizer

Use the next code cell to specify a loss function and optimizer. Save the chosen loss function as criterion_transfer, and fill in the function get_optimizer_transfer below.

In [72]:
## TODO: select loss function
criterion_transfer = nn.CrossEntropyLoss()


def get_optimizer_transfer(model):
    return torch.optim.Adam(model_transfer.fc.parameters(), lr=1e-3)

    
    

(IMPLEMENTATION) Model Architecture

Use transfer learning to create a CNN to classify images of landmarks. Use the code cell below, and save your initialized model as the variable model_transfer.

In [73]:
## TODO: Specify model architecture
from torchvision import models

model_transfer = models.resnet18(pretrained=True)
#freeze weights
for param in model_transfer.parameters():
    param.requires_grad = False
    
model_transfer.fc = nn.Linear(model_transfer.fc.in_features, 50)


#-#-# Do NOT modify the code below this line. #-#-#

if use_cuda:
    model_transfer = model_transfer.cuda()
In [74]:
model_transfer
Out[74]:
ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer2): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer3): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer4): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (avgpool): AvgPool2d(kernel_size=7, stride=1, padding=0)
  (fc): Linear(in_features=512, out_features=50, bias=True)
)

Question 3: Outline the steps you took to get to your final CNN architecture and your reasoning at each step. Describe why you think the architecture is suitable for the current problem.

Answer: I selected ResNet18 becaues it migtates the the vanishing gradient issue. Then I only changed the size of the fully connected layer to match the number of our classes. I freezed the weights of the pre-trained ResNet18 and only trained the fully connected layer. Since the model is trained on a larget set of images it suits this problem.

(IMPLEMENTATION) Train and Validate the Model

Train and validate your model in the code cell below. Save the final model parameters at filepath 'model_transfer.pt'.

In [75]:
# TODO: train the model and save the best model parameters at filepath 'model_transfer.pt'
num_epochs=100
model_scratch = train(num_epochs, loaders_transfer, model_transfer, get_optimizer_transfer(model_transfer), 
                      criterion_transfer, use_cuda, 'model_transfer.pt')


#-#-# Do NOT modify the code below this line. #-#-#

# load the model that got the best validation accuracy
model_transfer.load_state_dict(torch.load('model_transfer.pt'))
Epoch: 1 	Training Loss: 3.216906 	Validation Loss: 0.616971
Model saved!
Epoch: 2 	Training Loss: 2.098081 	Validation Loss: 0.469518
Model saved!
Epoch: 3 	Training Loss: 1.658699 	Validation Loss: 0.400297
Model saved!
Epoch: 4 	Training Loss: 1.433890 	Validation Loss: 0.367847
Model saved!
Epoch: 5 	Training Loss: 1.281467 	Validation Loss: 0.343426
Model saved!
Epoch: 6 	Training Loss: 1.203341 	Validation Loss: 0.334668
Model saved!
Epoch: 7 	Training Loss: 1.121934 	Validation Loss: 0.318321
Model saved!
Epoch: 8 	Training Loss: 1.056530 	Validation Loss: 0.309358
Model saved!
Epoch: 9 	Training Loss: 0.991926 	Validation Loss: 0.304979
Model saved!
Epoch: 10 	Training Loss: 0.946582 	Validation Loss: 0.299259
Model saved!
Epoch: 11 	Training Loss: 0.903529 	Validation Loss: 0.298498
Model saved!
Epoch: 12 	Training Loss: 0.895552 	Validation Loss: 0.290278
Model saved!
Epoch: 13 	Training Loss: 0.848252 	Validation Loss: 0.294960
Epoch: 14 	Training Loss: 0.823355 	Validation Loss: 0.284505
Model saved!
Epoch: 15 	Training Loss: 0.790771 	Validation Loss: 0.282548
Model saved!
Epoch: 16 	Training Loss: 0.791361 	Validation Loss: 0.291072
Epoch: 17 	Training Loss: 0.758257 	Validation Loss: 0.282213
Model saved!
Epoch: 18 	Training Loss: 0.755935 	Validation Loss: 0.285468
Epoch: 19 	Training Loss: 0.715206 	Validation Loss: 0.283561
Epoch: 20 	Training Loss: 0.715459 	Validation Loss: 0.281821
Model saved!
Epoch: 21 	Training Loss: 0.679872 	Validation Loss: 0.282912
Epoch: 22 	Training Loss: 0.687993 	Validation Loss: 0.281980
Epoch: 23 	Training Loss: 0.650675 	Validation Loss: 0.277016
Model saved!
Epoch: 24 	Training Loss: 0.665364 	Validation Loss: 0.281693
Epoch: 25 	Training Loss: 0.636570 	Validation Loss: 0.278210
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-75-bb67092b0ee9> in <module>()
      2 num_epochs=100
      3 model_scratch = train(num_epochs, loaders_transfer, model_transfer, get_optimizer_transfer(model_transfer), 
----> 4                       criterion_transfer, use_cuda, 'model_transfer.pt')
      5 
      6 

<ipython-input-39-0f73181e328a> in train(n_epochs, loaders, model, optimizer, criterion, use_cuda, save_path)
     14         # set the module to training mode
     15         model.train()
---> 16         for batch_idx, (data, target) in enumerate(loaders['train']):
     17             # move to GPU
     18             if use_cuda:

/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py in __next__(self)
    262         if self.num_workers == 0:  # same-process loading
    263             indices = next(self.sample_iter)  # may raise StopIteration
--> 264             batch = self.collate_fn([self.dataset[i] for i in indices])
    265             if self.pin_memory:
    266                 batch = pin_memory_batch(batch)

/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py in <listcomp>(.0)
    262         if self.num_workers == 0:  # same-process loading
    263             indices = next(self.sample_iter)  # may raise StopIteration
--> 264             batch = self.collate_fn([self.dataset[i] for i in indices])
    265             if self.pin_memory:
    266                 batch = pin_memory_batch(batch)

/opt/conda/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/datasets/folder.py in __getitem__(self, index)
     99         """
    100         path, target = self.samples[index]
--> 101         sample = self.loader(path)
    102         if self.transform is not None:
    103             sample = self.transform(sample)

/opt/conda/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/datasets/folder.py in default_loader(path)
    145         return accimage_loader(path)
    146     else:
--> 147         return pil_loader(path)
    148 
    149 

/opt/conda/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/datasets/folder.py in pil_loader(path)
    128     with open(path, 'rb') as f:
    129         img = Image.open(f)
--> 130         return img.convert('RGB')
    131 
    132 

/opt/conda/lib/python3.6/site-packages/PIL/Image.py in convert(self, mode, matrix, dither, palette, colors)
    890         """
    891 
--> 892         self.load()
    893 
    894         if not mode and self.mode == "P":

/opt/conda/lib/python3.6/site-packages/PIL/ImageFile.py in load(self)
    233 
    234                             b = b + s
--> 235                             n, err_code = decoder.decode(b)
    236                             if n < 0:
    237                                 break

KeyboardInterrupt: 

(IMPLEMENTATION) Test the Model

Try out your model on the test dataset of landmark images. Use the code cell below to calculate and print the test loss and accuracy. Ensure that your test accuracy is greater than 60%.

In [76]:
test(loaders_transfer, model_transfer, criterion_transfer, use_cuda)
Test Loss: 0.935974


Test Accuracy: 74% (937/1250)

Step 3: Write Your Landmark Prediction Algorithm

Great job creating your CNN models! Now that you have put in all the hard work of creating accurate classifiers, let's define some functions to make it easy for others to use your classifiers.

(IMPLEMENTATION) Write Your Algorithm, Part 1

Implement the function predict_landmarks, which accepts a file path to an image and an integer k, and then predicts the top k most likely landmarks. You are required to use your transfer learned CNN from Step 2 to predict the landmarks.

An example of the expected behavior of predict_landmarks:

>>> predicted_landmarks = predict_landmarks('example_image.jpg', 3)
>>> print(predicted_landmarks)
['Golden Gate Bridge', 'Brooklyn Bridge', 'Sydney Harbour Bridge']
In [81]:
test_transforms = transforms.Compose([
        transforms.Resize(226),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        normalize
])
In [83]:
import cv2
from PIL import Image

## the class names can be accessed at the `classes` attribute
## of your dataset object (e.g., `train_dataset.classes`)

def predict_landmarks(img_path, k):
    ## TODO: return the names of the top k landmarks predicted by the transfer learned CNN
    model_transfer.eval()
    img = Image.open(img_path)
    img = test_transforms(img).unsqueeze(0).cuda()
    out = model_transfer(img)
    del img
    idxs = torch.topk(out, k)[1][0].tolist()
    classes = list(train_ds.class_to_idx.keys())
    return [''.join(s for s in classes[i].replace('_', ' ') if s.isalpha() or s == ' ') for i in idxs]

    


# test on a sample image
predict_landmarks('images/test/09.Golden_Gate_Bridge/190f3bae17c32c37.jpg', 5)
Out[83]:
['Golden Gate Bridge',
 'Brooklyn Bridge',
 'Forth Bridge',
 'Sydney Harbour Bridge',
 'Sydney Opera House']

(IMPLEMENTATION) Write Your Algorithm, Part 2

In the code cell below, implement the function suggest_locations, which accepts a file path to an image as input, and then displays the image and the top 3 most likely landmarks as predicted by predict_landmarks.

Some sample output for suggest_locations is provided below, but feel free to design your own user experience!

In [84]:
def suggest_locations(img_path):
    # get landmark predictions
    predicted_landmarks = predict_landmarks(img_path, 3)
    text = ', '.join(predicted_landmarks[:-1]) + ' or ' + predicted_landmarks[-1]
    
    ## TODO: display image and display landmark predictions
    fig, ax = plt.subplots()
    img = Image.open(img_path)
    ax.imshow(img)
    ax.set_xlabel(f'Is this picture for the\n {text}?')
    
    

# test on a sample image
suggest_locations('images/test/09.Golden_Gate_Bridge/190f3bae17c32c37.jpg')

(IMPLEMENTATION) Test Your Algorithm

Test your algorithm by running the suggest_locations function on at least four images on your computer. Feel free to use any images you like.

Question 4: Is the output better than you expected :) ? Or worse :( ? Provide at least three possible points of improvement for your algorithm.

In [85]:
!wget https://www.nyc.gov/html/dot/images/infrastructure/brooklyn-bridge.jpg
--2021-08-04 14:11:56--  https://www.nyc.gov/html/dot/images/infrastructure/brooklyn-bridge.jpg
Resolving www.nyc.gov (www.nyc.gov)... 23.211.66.222, 2600:1407:3c00:791::1500, 2600:1407:3c00:783::1500
Connecting to www.nyc.gov (www.nyc.gov)|23.211.66.222|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 362293 (354K) [image/jpeg]
Saving to: ‘brooklyn-bridge.jpg.1’

brooklyn-bridge.jpg 100%[===================>] 353.80K  --.-KB/s    in 0.1s    

2021-08-04 14:11:57 (3.19 MB/s) - ‘brooklyn-bridge.jpg.1’ saved [362293/362293]

In [86]:
!wget https://www.detail-online.com/fileadmin/uploads/olympic_Nachtteaser1_Olympic_Stadium_CGI_01.jpg
--2021-08-04 14:11:58--  https://www.detail-online.com/fileadmin/uploads/olympic_Nachtteaser1_Olympic_Stadium_CGI_01.jpg
Resolving www.detail-online.com (www.detail-online.com)... 188.40.58.46
Connecting to www.detail-online.com (www.detail-online.com)|188.40.58.46|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 493326 (482K) [image/jpeg]
Saving to: ‘olympic_Nachtteaser1_Olympic_Stadium_CGI_01.jpg.1’

olympic_Nachtteaser 100%[===================>] 481.76K  1.07MB/s    in 0.4s    

2021-08-04 14:11:59 (1.07 MB/s) - ‘olympic_Nachtteaser1_Olympic_Stadium_CGI_01.jpg.1’ saved [493326/493326]

Answer: (Three possible points for improvement)

  1. Filter; keep relevant images only, and discard images unrelated to the scene like animals.
  2. Increase the trainning images volume.
  3. Add an extra class for unknown locations; if the user provided a picture of a place that doesn't exist.
In [87]:
suggest_locations('./brooklyn-bridge.jpg')
In [88]:
suggest_locations('./olympic_Nachtteaser1_Olympic_Stadium_CGI_01.jpg')
In [90]:
!wget https://image.freepik.com/photos-gratuite/forth-bridge-edinburgh_63253-7067.jpg
--2021-08-04 14:13:52--  https://image.freepik.com/photos-gratuite/forth-bridge-edinburgh_63253-7067.jpg
Resolving image.freepik.com (image.freepik.com)... 23.38.85.132, 2600:1407:3c00:10a1::30ec, 2600:1407:3c00:1088::30ec
Connecting to image.freepik.com (image.freepik.com)|23.38.85.132|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2021-08-04 14:13:57 ERROR 404: Not Found.

In [91]:
suggest_locations('./forth-bridge-edinburgh_63253-7067.jpg')
In [95]:
!wget https://upload.wikimedia.org/wikipedia/commons/9/93/Rathaus_Vienna_June_2006_165.jpg
--2021-08-04 14:15:18--  https://upload.wikimedia.org/wikipedia/commons/9/93/Rathaus_Vienna_June_2006_165.jpg
Resolving upload.wikimedia.org (upload.wikimedia.org)... 208.80.153.240, 2620:0:860:ed1a::2:b
Connecting to upload.wikimedia.org (upload.wikimedia.org)|208.80.153.240|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2181454 (2.1M) [image/jpeg]
Saving to: ‘Rathaus_Vienna_June_2006_165.jpg’

Rathaus_Vienna_June 100%[===================>]   2.08M  11.2MB/s    in 0.2s    

2021-08-04 14:15:18 (11.2 MB/s) - ‘Rathaus_Vienna_June_2006_165.jpg’ saved [2181454/2181454]

In [98]:
suggest_locations('./Rathaus_Vienna_June_2006_165.jpg')
In [99]:
!wget https://upload.wikimedia.org/wikipedia/commons/d/d3/Gateway_of_India_-Mumbai.jpg
--2021-08-04 14:16:23--  https://upload.wikimedia.org/wikipedia/commons/d/d3/Gateway_of_India_-Mumbai.jpg
Resolving upload.wikimedia.org (upload.wikimedia.org)... 208.80.153.240, 2620:0:860:ed1a::2:b
Connecting to upload.wikimedia.org (upload.wikimedia.org)|208.80.153.240|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10392951 (9.9M) [image/jpeg]
Saving to: ‘Gateway_of_India_-Mumbai.jpg’

Gateway_of_India_-M 100%[===================>]   9.91M  21.8MB/s    in 0.5s    

2021-08-04 14:16:24 (21.8 MB/s) - ‘Gateway_of_India_-Mumbai.jpg’ saved [10392951/10392951]

In [100]:
suggest_locations('./Gateway_of_India_-Mumbai.jpg')
In [ ]: